6 research outputs found

    Automatic handwriter identification using advanced machine learning

    Get PDF
    Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies. Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms. The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques. The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar method

    Writer identification approach based on bag of words with OBI features

    Get PDF
    Handwriter identification aims to simplify the task of forensic experts by providing them with semi-automated tools in order to enable them to narrow down the search to determine the final identification of an unknown handwritten sample. An identification algorithm aims to produce a list of predicted writers of the unknown handwritten sample ranked in terms of confidence measure metrics for use by the forensic expert will make the final decision. Most existing handwriter identification systems use either statistical or model-based approaches. To further improve the performances this paper proposes to deploy a combination of both approaches using Oriented Basic Image features and the concept of graphemes codebook. To reduce the resulting high dimensionality of the feature vector a Kernel Principal Component Analysis has been used. To gauge the effectiveness of the proposed method a performance analysis, using IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting, has been carried out. The results obtained achieved an accuracy of 96% thus demonstrating its superiority when compared against similar techniques

    Automatic Handwriter Identification Using Advanced Machine Learning

    Get PDF
    Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies. Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms. The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques. The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar methods

    A Comparative Study of Machine Learning Approaches for Handwriter Identification

    No full text
    During the past few years, writer identification has attracted significant interest due to its real-life applications including document analysis, forensics etc. Machine learning algorithms have played an important role in the development of writer identification systems demonstrating very effective performance results. Recently, the emergence of deep learning has led to various system in computer vision and pattern recognition applications. Therefore, this work aims to assess and compare the performance between one of the deep learning algorithms, AlexNet model, with two of the most effective machine learning classification approaches: Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN). The evaluation has been conducted using both IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting

    Measuring and optimising performance of an offline text writer identification system in terms of dimensionality reduction techniques

    No full text
    Usually, most of the data generated in real-world such as images, speech signals, or fMRI scans has a high dimensionality. Therefore, dimensionality reduction techniques can be used to reduce the number of variables in that data and then the system performance can be improved. Because the processing of the high dimensional data leads the increase of complexity both in execution time and memory usage. In the previous work, we developed an offline writer identification system using a combination of Oriented Basic Image features (OBI) and the concept of graphemes codebook. In order to measure and optimise the system performance, a variety of nonlinear dimensionality reduction algorithms such as Kernel Principal Component Analysis (KPCA), Isomap, Locally linear embedding (LLE), Hessian LLE and Laplacian Eigenmaps have been used. The performance has been evaluated based on IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained indicated the system performance based KPCA was better than the other reduction techniques that have been used and investigated in this work.Scopu

    Offline Writer Identification using Deep Convolution Neural Network

    No full text
    Deep convolutional neural networks (DCNN) are efficient in solving different pattern recognition problems and have been applied to extract image features (IFs). This paper investigates using deep learning (DL) techniques to improve the performance of the writer identification (WI) process. This work presents a novel approach for WI tasks by combining a DL technique with machine learning (ML). A convolutional neural network (CNN) is employed as a feature extractor along with a ML algorithm to classify those features. The standard Alex-Net model is utilized to extract IFs that located in the fully connected layers (FCLs). The support vector machine (SVM) model is selected as the classifier due to its efficient capabilities to improve identification performance (IP). The proposed model is tested using various types of the datasets, namely the Islamic Heritage Project (IHP) and Clusius. Furthermore, IAM and ICFHR-2012 datasets have been employed for benchmarking the proposed model. The results demonstrate the model achieves superior performance
    corecore